-
Couldn't load subscription status.
- Fork 6.1k
8370345: Parallel: Rework TLAB accounting in MutableNUMASpace #27935
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
8370345: Parallel: Rework TLAB accounting in MutableNUMASpace #27935
Conversation
|
👋 Welcome back jsikstro! A progress list of the required criteria for merging this PR into |
|
@jsikstro This change now passes all automated pre-integration checks. ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details. After integration, the commit message for the final commit will be: You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed. At the time when this comment was updated there had been 72 new commits pushed to the
As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details. ➡️ To integrate this PR with the above commit message to the |
Webrevs
|
|
/cc add hotspot-gc |
|
@jsikstro |
Hello,
Parallel's MutableNUMASpace is the only GC interface that uses the Thread parameter passed through the general CollectedHeap interface to tlab_capacity, tlab_used, and unsafe_max_tlab_alloc. It would be nice if Parallel's MutableNUMASpace could do without the Thread and instead find a thread-agnostic approach. By removing the need for the thread, it becomes possible to clean up the shared CollectedHeap interface, which makes it easier to read and maintain all GCs. Also, the lgrp_id that is stored in the Thread class should really have been moved to GCThreadLocalData after that concept was created, but with a thread-agnostic approach, the field can be removed entirely.
The current solution is not without problems. When a new allocation is made inside one of the LGRP spaces in MutableNUMASpace using cas_allocate(), the NUMA/LGRP id is polled and stored inside the Thread, and we only attempt to allocate on that LGRP. If allocation fails on the local LGRP, we do not try to allocate on any other (remote) LGRP(s). This fact is reflected in the TLAB accounting methods tlab_capacity, tlab_used, and unsafe_max_tlab_alloc, which only check how much memory is used, etc., for the LGRP matching the stored LGRP id in the Thread. This model breaks down when threads are allowed to migrate between different CPUs, and therefore also NUMA nodes, which might change the LGRP id.
For example, a system with two NUMA nodes gives us two LGRPs with ids 0 and 1. If a thread allocates most of its memory on LGRP 0 and then migrates to a CPU on LGRP 1, the thread will show that it allocated a significant amount of memory, but the used memory on the LGRP it is currently on could be very low. This would give a disproportionate allocation fraction. This is not a problem as the TLAB code accounts for this, but for a different reason entirely. The other way around could also be problematic. If a thread allocates very little memory on LGRP 0 and then migrates to LGRP 1, where another thread has allocated a lot of memory, the allocation fraction will be very low, when it could have a really high fraction if accounting for the used memory on its original LGRP.
A solution to both of these issues is to average the capacity, used, and available memory across all LGRPs for the TLAB accounting methods. This approach provides a more accurate and stable view of memory usage and availability, regardless of thread migration or imbalances in NUMA/LGRP allocation. However, there are trade-offs to consider. Averaging the accounting may mask allocation pressure on specific LGRPs and could result in the allocation history of TLABs being updated more/less frequently. In summary, considering the whole eden space (i.e., the entire MutableNUMASpace) instead of just a single LGRP when accounting for TLABs should give a more holistic view.
I plan to clean up the
Thread*parameter to tlab_capacity, tlab_used and unsafe_max_tlab_alloc in a following RFE. Since it is a pretty significant cleanupTesting:
-XX:+UseParallelGC -XX:+UseNUMA. Slight improvement on SPECjbb2005.Progress
Issue
Reviewers
Reviewing
Using
gitCheckout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/27935/head:pull/27935$ git checkout pull/27935Update a local copy of the PR:
$ git checkout pull/27935$ git pull https://git.openjdk.org/jdk.git pull/27935/headUsing Skara CLI tools
Checkout this PR locally:
$ git pr checkout 27935View PR using the GUI difftool:
$ git pr show -t 27935Using diff file
Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/27935.diff
Using Webrev
Link to Webrev Comment